feat: add container and pod security context support for kubernetes workloads#3101
feat: add container and pod security context support for kubernetes workloads#3101
Conversation
…orkloads Add security_context (container-level) and pod_security_context (pod-level) parameters to the @kubernetes decorator, with environment variable fallbacks METAFLOW_KUBERNETES_SECURITY_CONTEXT and METAFLOW_KUBERNETES_POD_SECURITY_CONTEXT for org-wide defaults. Supports all standard Kubernetes security context fields including runAsUser, runAsGroup, fsGroup, capabilities, etc. Closes #1262
Greptile SummaryThis PR adds
Confidence Score: 4/5Not safe to merge as-is — pod security context is a no-op for non-parallel Argo Workflow steps due to a snake_case/camelCase mismatch in podSpecPatch serialisation. One P1 bug (Argo non-parallel path ignores pod_security_context silently) needs to be fixed before merge. Direct K8s job, JobSet, and Argo parallel paths are all correct. Remaining findings are P2. metaflow/plugins/argo/argo_workflows.py — the Important Files Changed
Reviews (1): Last reviewed commit: "feat: add container and pod security con..." | Re-trigger Greptile |
| .pod_spec_patch( | ||
| { | ||
| "securityContext": kubernetes_sdk.V1PodSecurityContext( | ||
| **pod_security_context | ||
| ).to_dict() | ||
| } | ||
| if pod_security_context | ||
| else None | ||
| ) |
There was a problem hiding this comment.
to_dict() produces snake_case keys — pod security context silently ignored in Argo
V1PodSecurityContext.to_dict() returns Python attribute names in snake_case (e.g. fs_group, run_as_non_root), but podSpecPatch is applied as a raw JSON strategic-merge patch against the Kubernetes API, which requires camelCase (e.g. fsGroup, runAsNonRoot). Kubernetes ignores unrecognised keys, so the pod security context will be accepted without error but have no effect for all non-parallel Argo Workflow steps.
The existing code already imports to_camelcase from metaflow.util and uses it for the container spec. Apply the same treatment here:
| .pod_spec_patch( | |
| { | |
| "securityContext": kubernetes_sdk.V1PodSecurityContext( | |
| **pod_security_context | |
| ).to_dict() | |
| } | |
| if pod_security_context | |
| else None | |
| ) | |
| .pod_spec_patch( | |
| { | |
| "securityContext": to_camelcase( | |
| kubernetes_sdk.V1PodSecurityContext( | |
| **pod_security_context | |
| ).to_dict() | |
| ) | |
| } | |
| if pod_security_context | |
| else None | |
| ) |
| pod_security_context = resources.get("pod_security_context", None) | ||
| _pod_security_context = {} | ||
| if pod_security_context is not None and len(pod_security_context) > 0: | ||
| _pod_security_context = { | ||
| "security_context": kubernetes_sdk.V1PodSecurityContext( | ||
| **pod_security_context | ||
| ) | ||
| } |
There was a problem hiding this comment.
Unused
_pod_security_context variable — dead code
_pod_security_context is computed here but never referenced. For the parallel path, the raw pod_security_context dict is forwarded to KubernetesArgoJobSet (which handles it via JobSetSpec). For the non-parallel path, the pod security context is applied via pod_spec_patch() further down. This block can be removed.
| KUBERNETES_QOS = from_conf("KUBERNETES_QOS", "burstable") | ||
| # Default container security context (JSON) for kubernetes pods | ||
| KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", "") |
There was a problem hiding this comment.
Inconsistent default:
"" instead of None
All adjacent config vars (KUBERNETES_MEMORY, KUBERNETES_DISK, etc.) default to None. Using "" works because the falsiness check in the decorator handles it, but it's inconsistent and slightly harder to reason about. Consider defaulting to None to match the established pattern.
| KUBERNETES_QOS = from_conf("KUBERNETES_QOS", "burstable") | |
| # Default container security context (JSON) for kubernetes pods | |
| KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", "") | |
| KUBERNETES_SECURITY_CONTEXT = from_conf("KUBERNETES_SECURITY_CONTEXT", None) | |
| # Default pod security context (JSON) for kubernetes pods | |
| KUBERNETES_POD_SECURITY_CONTEXT = from_conf("KUBERNETES_POD_SECURITY_CONTEXT", None) |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| # Security context: decorator takes precedence over env var | ||
| if not self.attributes["security_context"] and KUBERNETES_SECURITY_CONTEXT: | ||
| self.attributes["security_context"] = json.loads( | ||
| KUBERNETES_SECURITY_CONTEXT | ||
| ) | ||
| if ( | ||
| not self.attributes["pod_security_context"] | ||
| and KUBERNETES_POD_SECURITY_CONTEXT | ||
| ): | ||
| self.attributes["pod_security_context"] = json.loads( | ||
| KUBERNETES_POD_SECURITY_CONTEXT | ||
| ) |
There was a problem hiding this comment.
Explicit
security_context={} would be overridden by the env var
not {} evaluates to True, so a user who explicitly sets @kubernetes(security_context={}) would have the env var silently take precedence, contradicting the "decorator takes precedence" comment. The same caveat applies to pod_security_context. Consider comparing to None instead of using truthiness, which is also the pattern used for non-JSON options like port and shared_memory.
| @pytest.fixture | ||
| def mock_kubernetes_client(): | ||
| """Create a mock Kubernetes client that tracks calls to V1SecurityContext and V1PodSecurityContext.""" | ||
| with patch("metaflow.plugins.kubernetes.kubernetes_job.KubernetesJob") as _: | ||
| from kubernetes import client | ||
|
|
||
| yield client | ||
|
|
Summary
security_contextandpod_security_contextparameters to the@kubernetesdecoratorMETAFLOW_KUBERNETES_SECURITY_CONTEXTandMETAFLOW_KUBERNETES_POD_SECURITY_CONTEXTTest plan
test/unit/test_kubernetes_security_context.py— unit tests for security context parameter handling🤖 Generated with Claude Code